A Novel Integrated Classifier for Handling Data Warehouse Anomalies
نویسندگان
چکیده
Within databases employed in various commercial sectors, anomalies continue to persist and hinder the overall integrity of data. Typically, Duplicate, Wrong andMissed observations of spatial-temporal data causes the user to be not able to accurately utilise recorded information. In literature, different methods have been mentioned to clean data which fall into the category of either deterministic and probabilistic approaches. However, we believe that to ensure the maximum integrity, a data cleaning methodology must have properties of both of these categories to effectively eliminate the anomalies. To realise this, we have proposed a method which relies both on integrated deterministic and probabilistic classifiers using fusion techniques. We have empirically evaluated the proposed concept with state-of-the-art techniques and found that our approach improves the integrity of the resulting data set.
منابع مشابه
A Comprehensive Mathematical Model for the Design of a Dynamic Cellular Manufacturing System Integrated with Production Planning and Several Manufacturing Attributes
Dynamic cellular manufacturing systems, Mixed-integer non-linear programming, Production planning, Manufacturing attributes This paper presents a novel mixed-integer non-linear programming model for the design of a dynamic cellular manufacturing system (DCMS) based on production planning (PP) decisions and several manufacturing attributes. Such an integrated DCMS model with an extensi...
متن کاملSUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS
This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...
متن کاملFormal approach to modelling a multiversion data warehouse
A data warehouse (DW) is a large centralized database that stores data integrated from multiple, usually heterogeneous external data sources (EDSs). DW content is processed by so called On-Line Analytical Processing applications, that analyze business trends, discover anomalies and hidden dependencies between data. These applications are part of decision support systems. EDSs constantly change ...
متن کاملA Novel Type-2 Adaptive Neuro Fuzzy Inference System Classifier for Modelling Uncertainty in Prediction of Air Pollution Disaster (RESEARCH NOTE)
Type-2 fuzzy set theory is one of the most powerful tools for dealing with the uncertainty and imperfection in dynamic and complex environments. The applications of type-2 fuzzy sets and soft computing methods are rapidly emerging in the ecological fields such as air pollution and weather prediction. The air pollution problem is a major public health problem in many cities of the world. Predict...
متن کاملA Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows
One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...
متن کامل